⚙️ SWE-rebench: Nebius AI R&D team presents new dataset for SWE tasks.Researchers built an automated system to collect and validate thousands of real-world tasks from GitHub

Data Science by ODS.ai 🦜

⚙️ SWE-rebench: Nebius AI R&D team presents new dataset for SWE tasks.

Researchers built an automated system to collect and validate thousands of real-world tasks from GitHub, designed for training and evaluation of LLMs in software engineering.

Main features of the system:
1️⃣ Automatic data collection: Continuously extracts issue-PR pairs from Python repositories.
2️⃣ LLM-based environment setup: LLM analyzes repositories, creates install instructions, and updates them if errors happen.
3️⃣ Execution-based validation: Each task is tested by automatic setup, test run, and dependency freezing to make it reproducible.
4️⃣ LLM quality annotation: Tasks are labeled for clarity, difficulty, and test correctness to support filtering.

Result:
SWE-rebench dataset: 21,000+ ready-to-use interactive tasks.
Continuous updates: Fresh data is added regularly.
Transparent evaluation: Tasks are used for public SWE-rebench leaderboard.

🚀 SWE-rebench gives researchers and developers real and validated tasks to work with LLMs in SWE field.

Technical report: arXiv
Dataset: SWE-rebench

www.tg-me.com/us/Data Science by ODS ai 🦜/com.opendatascience/2331

1.7K viewsMay 29 at 15:03

tg-me.com/opendatascience/2331

Create: 2025-05-29
Last Update: 2025-05-30 22:47:47

BY Data Science by ODS.ai 🦜

Share with your friend now:
tg-me.com/opendatascience/2331

Data Science by ODS ai 🦜 Telegram | DID YOU KNOW?

⚙️ SWE-rebench: Nebius AI R&D team presents new dataset for SWE tasks.Researchers built an automated system to collect and validate thousands of real-world tasks from GitHub